[mathcore] Fix TKDTreeBinning::FindBin returning out-of-range bins #22649
Open
guitargeek wants to merge 4 commits into
Open
[mathcore] Fix TKDTreeBinning::FindBin returning out-of-range bins #22649guitargeek wants to merge 4 commits into
guitargeek wants to merge 4 commits into
Conversation
1 task
Test Results 22 files 22 suites 3d 19h 24m 34s ⏱️ For more details on these failures, see this check. Results for commit 1c6eafc. ♻️ This comment has been updated with latest results. |
hageboeck
approved these changes
Jun 19, 2026
hageboeck
left a comment
Member
There was a problem hiding this comment.
It looks correct, but see a few minor comments.
The test had no failure return code, so errors would go undetected.
TKDTreeBinning computed fNBins independently from the kd-tree it builds. SetNBins picks a bucket size of fDataSize / nBins (integer division), but the underlying TKDTree creates ceil(fNPoints / bucketSize) terminal nodes. When the requested number of bins does not divide the data size evenly, the bucket size is rounded down and the tree ends up with more terminal nodes than fNBins. Since FindBin returns a terminal-node index, it could return a value >= GetNBins(), pointing at a bin with no edges and no content: ```c++ binning.GetNBins() # 1001 binning.FindBin([1.2, 0.5, 0.5, 0.5, 0.5]) # 1004 binning.GetBinContent(1004) # RuntimeWarning: No such bin ``` Set fNBins to the actual number of terminal nodes (GetNNodes() + 1) after building the tree, so FindBin always returns an index within [0, GetNBins()) and every bin has valid edges and content. This also self-corrects existing objects on read-back, as the Streamer rebuilds the tree via SetNBins. Also fix SetBinsContent to read the per-node point counts directly from the tree (GetNPointsNode) instead of an incorrect last-bin heuristic, so bin contents are exact and sum to the data size. Add a regression test covering the non-evenly-dividing case. Closes root-project#10786. 🤖 Done with the help of AI.
Add a regression test covering the non-evenly-dividing TKDTreeBinning case. This covers the bug reported in GitHub issue root-project#10786. 🤖 Done with the help of AI.
lmoneta
approved these changes
Jun 19, 2026
lmoneta
left a comment
Member
There was a problem hiding this comment.
LGTM!
Just a very minor comment. Thanks for fixing also this old issue
| @@ -28,30 +28,36 @@ using std::cout; | |||
| using std::cerr; | |||
| using std::endl; | |||
|
|
|||
Member
There was a problem hiding this comment.
Maybe you could also do using std::size_t to avoid writing std::size_t
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TKDTreeBinning computed fNBins independently from the kd-tree it builds. SetNBins picks a bucket size of fDataSize / nBins (integer division), but the underlying TKDTree creates ceil(fNPoints / bucketSize) terminal nodes. When the requested number of bins does not divide the data size evenly, the bucket size is rounded down and the tree ends up with more terminal nodes than fNBins. Since FindBin returns a terminal-node index, it could return a value >= GetNBins(), pointing at a bin with no edges and no content:
Set fNBins to the actual number of terminal nodes (GetNNodes() + 1) after building the tree, so FindBin always returns an index within [0, GetNBins()) and every bin has valid edges and content. This also self-corrects existing objects on read-back, as the Streamer rebuilds the tree via SetNBins.
Also fix SetBinsContent to read the per-node point counts directly from the tree (GetNPointsNode) instead of an incorrect last-bin heuristic, so bin contents are exact and sum to the data size.
Add a regression test covering the non-evenly-dividing case.
Closes #10784.
🤖 Done with the help of AI.